Treating Dictionaries as a Linked-Data Corpus

نویسندگان

Peter Bouda

Michael Cysouw

چکیده

In this paper we describe a practical approach to the challenge of linguistic retrodigitization. We propose to distinguish strictly between a base digitization and separate interpretation of the sources. The base digitization only includes a literal electronic transcript of the source. All sources are thus simply treated as strings of characters, i.e. as unstructured corpora. The often complex structure as found in many dictionaries and grammars will subsequently (and possibly much later) be added as Linked Data in the form of standoff annotation. A further advantage of this approach is that the complete digitization and interpretation can be performed collaboratively without a complex organizational superstructure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilingual linked data

The interaction of natural language processing and the Semantic Web have lead to the creation of a new paradigm known as Linguistic Linked Open Data (LLOD), whereby traditional language resources are made available as linked data. Conversely, the publication of corpora, machine-readable dictionaries as linked data has opened new resources to Semantic Web researchers and allowed new tools to be ...

متن کامل

X-Linked Lissencephaly with Absent Corpus Callosum and Ambiguous Genitalia: A Case Report

Background: X-linked lissencephaly with ambiguous genitalia (XLAG) is a recently described genetic disorder, in which patients present with lissencephaly, agenesis of the corpus callosum, refractory epilepsy of neonatal onset, acquired microcephaly, and male genotype with ambiguous genitalia. XLAG is responsible for a severe neurological disorder of neonatal onset in boys. A gyration defect con...

متن کامل

Optimized Selection of Intonation Dictionaries in Corpus Based Intonation Modelling

Data scarcity in corpus-based intonation modelling for TTS applications is addressed. We propose to apply a searching process to a list of dictionaries of classes of intonation patterns previously trained from corpus to avoid problems associated with the scarce number of samples in the classes. Results indicate that better results are obtained in comparison with previous alternatives where the ...

متن کامل

Color Dictionaries and Corpora

In the study of linguistics, a corpus is a data set of naturally occurring language (speech or writing) that can be used to generate or test linguistic hypotheses. The study of color naming worldwide has been carried out using three types of data sets: (1) corpora of empirical color-naming data collected from native speakers of many languages; (2) scholarly data sets where the color terms are o...

متن کامل

Machine-Readable Dictionaries in Text-to-Speech Systems

This paper presents the results of an experiment usiug machine-readable dictionaries (Mill)s) and corpora for building concatenativc units for text to speech (T'PS) systems. Theoretical questions concerning the nature of t)honemic data in dictionaries are raised; phonemic dictionary data is viewed as a representative corpus over which to extract n-gram phonemic frequencies in the language. Dict...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Treating Dictionaries as a Linked-Data Corpus

نویسندگان

چکیده

منابع مشابه

Multilingual linked data

X-Linked Lissencephaly with Absent Corpus Callosum and Ambiguous Genitalia: A Case Report

Optimized Selection of Intonation Dictionaries in Corpus Based Intonation Modelling

Color Dictionaries and Corpora

Machine-Readable Dictionaries in Text-to-Speech Systems

عنوان ژورنال:

اشتراک گذاری